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ABSTRACT 

The major purpose or this study is to further the 
development of procedures which minimize current limitations of 
creativity instruments, thus yielding a reliable and functional means 
for assessing creativity. Computerized content analysis and multiple 
regression are employee to simulate the creativity ratings of trained 
judges. The computerized scoring procedure is evaluated in a 
cross-validation sample,. The methodological problems of establishing 
a reliable criterion and generating pa rsimonious forced prediction 
models through predictor stability analysis are emphasized, and 
possible solutions are explored. Support tor the proposed solutions 
and empirical, validation are incorporated in the study. The forced 
model results are regarded as tentative and should be tested on . new 
sample. Bibliography and statistical data are included. (Author/AE) 




000 479 ED 0492 9 2 



COMPUTER SIMULATION OF HUMAN BEHAVIOR: 
ASSESSMENT OF CREATIVITY 



John F. Greene 

The University of Bridgeport 



U S DEPARTMENT OF HEALTH. EDUCATION 
ft Wf'.f ARE 

OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRODUCED 
EXACTLY AS RECEIVEC FROM THE PERSCNOR 
ORGANIZATION OR'GlNATING IT PDl^TS OF 
VIEW OR OPINIONS STATED DO NOT NECES 
SARICV REPRESENT OF FlClAL Of F ICE 0 F f DU 
C VTiON POSITION OR POLICY 



Presented at the symposium on 

M ultiple Regrea sion Prediction Models in the Behavioral Sciences 

AERA Annual Meeting, IS 71 
New York, New York 



* ^ 



^4 




1 



Computer Simulation of Human Behavior: 



Assessment of Creativity 
John F. Greene 

The University of Bridgeport 

This report is divided into three sections. In Section I a review of 
the basic research study is presented. The study represents the third stage 
of ongoing research in the field of scoring creativity tests by computer. In 
stage one Dieter Paulus and Joseph Renzulli generated the idea of conducting 
such research and demonstrated feasibility. Computerized scoring procedures 
for three creativity tasks were developed by Francis Archambault Juring the 
second stage. The last two sections of this report consider the methodolog- 
ical problems of establishing reliable criteria and generating parsimonious 
prediction models in the b< tavioral sciences as related to this study. 

I. Review of the Original Research Study 

The major purpose of this study was to further the development of pro- 
cedures which minimize the current limitations of creativity instruments, thu 
yielding a more reliable and functional means for assessing creativity, com- 
puterized content analysir was employed to simulate the creativity ratings 
that trained human judges make in the process of scoring the free, open- 
ended responses to Torrance Tests of Creative Thinking (TTCT) (Torrance, 
lC66a) . The Verbal, Form A version of the TTCT served as the basic source 
upon which reliable and functional scoring strategies were developed, but 
these strategies are not necessarily limited to this test battery. Form A of 
the TTCT consists of seven activities. Only the last four, however, we 
considered In this study. Activities four through seven are Product Improve- 
ment (toy elephant), Unusual Uses (cardboard boxes), Unu3ual Questions (card- 

O 

ERIC* boxes), and Just Suppose (if clouds had strings, what would happen?) 
respectively. £ 
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Each activity is scored for three dimensi;; o of creativity: fluency, flex- 

ibility, and originality. A flexibility score, however, is not determined 
for the sixth activity, Unusual Questions. 

The scoring procedures remain constant throughout the first five activ- 
ities of r.he TTCT. Different techniques, however, are employed in activities 
six and seven. The computerized scoring strategies for activities four and 
five included elaborate dictionaries. These dictionaries were constructed 
using the categories provided by Torrance as the basic structure. Activities 
six and seven present the unique challenges of determining the complexity of 
the answer given only the question and detecting shifts in attitude or focus 
between responses respectively. The computerized scoring procedures developed 
for these two tasks are beyond the scope of this paper. Acturial variables 
were employed to supplement prediction in ail four activities. 

From a sample of 375 students used in a study by Treffinger and Ripple 
(I960), 153 subjects were randomly selected. Four judg£3 rated the response? 
of each subject. Analysis of variance procedures were used to provide a 
reliability estimate of the pooled ratings of the judges (Winer, 1962, pp. 
124-132). A step-wise multiple regression technique was employed to maximize 
the prediction of each subject's scores for each activity. The predictors 
included the acturial and dictionary parameters generated earlier by the 
SCRTXT computer program (Fisher, 1968). Besides the full model, restricted 
and forced regression models were generated. Hie entire computerized scaring 
procedure was then evaluated in a cross-validation sample. 

The adjusted pooled reliability estimates based on all four judges for 
fluency and flexibility were most satisfactory, ranging fxcm .80 .o .99 vith 
6 of 7 above .94. The originality reliabilities, although satisfactory , vere 
somexdiat lower. Their range was bounded by .66 a n d .86. 

The results of the multiple correlation analyser ere snr.wariz'sd in 



mammi ble I. The full modal coefficients foi fluency rang* from .92 to .99. 
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The range for flexibility is .84 to .91. The mult-R’s for originality are 
.84, .07, .80 and .73 for activities 4-7 respectively. 

The restricted model results parallel those of the full model. In all 
but one equation, the multiple correlation coefficient dropped by less than 
one-hundreth of a point. The greatest loss in potential predicability was 
realised in Activity 7, originality, where a .03 difference was noted. In 
these restricted analyses, no more than half of the original set of predictors 
was utilized, with 4 instances of using as few as 5 or 6 predictors. The 
apparent lack of significant losses in prediction power with a partial set 
of predictors has important implications for future research. Furthermore, 
higher cross-validation correlations may be expected because of the reduction 
in number of predictor variables. 

Greater losses in the multiple-R coefficient were detected for the cv?o 
forced models. Activity 7, flexibility dropped from .84 to .73, and a .13 
decline from .73 was noted for the Activity originality forced model. 

These models were generated, however, because of low cross-validations in 
their respective full and restricted models, as will be shovm soon. This, 
while lower multiple correlations were obtained, higher cro3S-validation 
correlations are expected. One advantage of the particular forced models 
considered is that they employ only 3 and 4 predictors. 

All the multiple correlation coefficients reported are high and sig- 
nificant beyond the .01 level. Before speculating on the true value of ‘hese 
results, however, the validity cf the prediction equations will bo estimated. 

The attenuated cross-validation correlations also appear in Table I. The 
r.^ors-validation correlations for the first nine equations of the full end 
restricted models range from .79 to .96. Each is significant beyond the ,01 
level, but, more importantly, each one indicated that the corresponding 
equation is capable of excellent prediction. The shrinkage, or diefere e 
between the multiple correlation and the cross-validation covrel- ion, ? 
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minimal, never exceeding .10. Thus, the high result level anticipated after 
considering the multiple regression analyses was in fact achieved for the 
first nine equations, 

Considerable shrinkage was noted in both the full and restricted models 
for the flexibility and originality dimensions of Activity 7. The attenuated 
cross-validation correlations of .56 and .43 in the. full model and .59 and .62 
in the restricted model certainly are at least of moderate value in view of 
the present state of the art (Page and Paulus, 1963; Archambault, 1969); 
however, in comparison to the results of equations one through nine, they 
are somewhat disappointing. Hence, the additional analyses were conducted, 
and a third model, the forced model, was generated. 

As expected, the attenuated cross-validation correlations for the forced 
models exceeded the corresponding correlations in the first two models. 

Cor . lations of .77 and *70 for the flexibility and originality equations in 
Activity 7 were established. Of course, these results are tenative, and must • 
be tested in a new sample. They represent a goal of minimum stature for 
future researchers. 

The results of the multiple correlation analyses and the corresponding 
cross-validation correlations must be considered most encouraging. Accurate 
estimates of the creativity ratings of human judges were achieved by employ- 
ing computerized content analysis and computer simulation procedures. Per- 
haps a more significant outcome, however, is the reliability of this auto- 
mated process. If these same responses were to be rated at a later time, 
the computer ratings would have a reliability of 1.00. Certainly human 
judges would not approach this perfection. The succc;*3 achieved thus far 
in developing a computerized scoring procedure for creativity ter.tu strongly 
suggest 8 that similar applications in other areas where open-ended responses 
are analyzed by human judges are warranted. These other areas include per- 

O 

and interest tests. 
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II. Criterion Development 

The procedures in this section were employed to establish a reliable 
criterion for each dimension of each activity. Four judges* rather than a 
single judge, were used in an effort to maximize the reliabilities of the 
ratings. The judges were thoroughly trained. In addition, several statis- 
tical methods were explored. 

The Judges 

One professional judge and three educational psychology students were 
responsible for scoring the responses of the 153 subjects to the TTCT. A 
professional judge is defined as a judge employed by a test scoring bureau. 
Of the three trained student judges, one was a first year graduate student, 
and the other two were completing their third year of undergraduate work. 
Procedures for Training Judge s 

It is assumed that the professional judge was trained in th^ manner 

prescribed by Torrance (1966bc), The student judges were trained in the 

fallowing fashion by Archambault (1969, p. 30): 

To provide uniformity of orientation and to improve 
inter-scorer reliability, a number of procedures were 
utilized in the training of the judges. 

To give a greater appreciation for the concept of 
creativity by becoming actively involved in the creative 
process, each judge was administered the Torrance Tests 
of Creative Thinking, Verbal Form A . Next, a series 
of seminars were conducted for the scorers during 
which the process of creativity and possible problems 
relating to the scoring procedures were discussed. 

The scorers were then provided with copies of Torrance’s 
Guiding Creative Talent (1962) and were asked to read 
selected chapters. Copies of the Torrance Teste of Creative 
Thinking ; Norms -Technical Manual (Torrance, 1966c) and 
the Torrance Tests of Creative Thinking: Directi ons 

Manual and Scoring Guide (Torrance, 1966b) were also provided. 

After the literature and manuals had been read, the 
judges were asked to score a sample set of responses 
listed in the Scoring Guide. The scorers then met 
as a group and discussed their rationale for essigning 
scores to each of the individual responses. Where 
differences of opinion existed between the judges 
and the Scoring Guide, the possible reasons for such 

G 
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differences were analyzed. As a final Activity in 
the training process, a meeting was arranged between 
the scorers and Dr. E. Paul Torrance. During this 
meeting the scorers had the opportunity to raise 
any unresolved questions emanating from the practice 
scoring which they had performed. 

Additional steps taken to improve reliability included: 
a) a discussion of the optional amount of time for 
scoring in any one sitting; b) the provision of a 
"paste-up” of the scoring manual that enablfd the 
scorers to view one Activity or sub- test at a glance; 
and c) the scoring of the responses of all subjects 
to one Activity before proceeding to the next Activity. 

Reliability of Judges 

Several statistical analyses were performed in an effort to maximize 
the reliabilities of the creativity ratings. Initially, this writer de- 
veloped a cycling type of reliability computer program which generated a 
reliability estimate of the pooled ratings of the judges using analysis of 
variance techniques (Winer, 1962, pp. 124-132) for all four judges and for all 
combinations of three Judges. The main purpose of this program was to de- 
termine if any one judge : s ratings should be deleted. The program also 
generated adjusted reliabilities. These adjusted reliabilities, generally 
higher than the unadjusted estimates, eliminate the effect of differences in 
judges 1 means and should be utilized when the investigator xb not willing to 
accept the assumption of mean homogeneity (Ebel, 1951). 

The means and standard deviations for the four judges are presented in 
Table II. Table III contains the judge inter-correlation matrix. The trained 
student judges are 1, 2, and 3; judge 4 is the professional scorer. 

The results of the cycling program art presented in Table IV. The 
judge code parameter indicates which judge was not considered in the par- 
ticular analysis. Judge ;ode ”0" indicates that all four judges were con- 
sidered. Two statements are based on the results. First, a function of all 
four judges' ratings will constitute the criterion, because generally the 
f *ghest reliabilities are generated when all four judges were considered. 




And second, it is appropriate to utilize the adjusted reliability estimates 
for the originality dimension* 

The judge code "O' 1 , adjusted reliabilities for fluency and flexibility 
are most satisfactory, ranging from .88 to .99 with 6 of 7 above .94. The 
originality reliabilities, although satisfactory, were somewhat lower. Their 
range was bounded by .66 and *86. Thus, additional statistical methods were 
applied to the originality ratings, as suggested by Page and Paulus (1968). 

This involved factor analysing the raters on the originality scores and deter- 
mining their factor scores on the first principal component. Xhe factor scores 
ot some function of them, are then used to differentially weight the raters. 

As Can be seen by examining the results in Table V, the factor scores generated 
for fach scorer were not considerably different. Even if a power function 
were applied to these loadings, the resulting composite score probably would 
not differ greatly from a simple average of the scores. Hence, the criterion 
scores for originality as well as fluency and flexibility were the mean of the 
four judges' ratings. 

Ill* Generating Parsimonious Prediction Models Through Predictor Stability 



Forced regression models were generated in this study primarily to deter- 
mine the potential of predicting thoje creativity scores for which considerable 
shrinkage was noted in the full and restricted models. A forced model is 
in which the researcher selects a partial set of predictors and forces them 
into the snalysis before the remaining variables of the full set are allowed 
to enter. If the forced model is to differ from the full model, it must also 
be restricted. 

Before considering the process of selecting the forced predictor variables 
the rationale for using this type of model will be discussed. In multiple 
regression anslysis, only the full model reflects the present state of the art 
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models represent goals to be attained In future research, and each of these 
models must be applied to a new sample if they are to be recognized as being 
valid. Thus, when working with models other than the full model, the re- 
searcher need not necessarily restrict his efforts to only the development 
sample. He must realize, however, that the results are tentative and based 
on the assumptions, however implicit, corresponding to his method of generat- 
ing the restricted or forced model. 

In this study, the forced predictor variables were selected by analysing 
the correlations between the predictors and the criterion in both the develop- 
ment and cross-validation sample. Only those predictors whose correlations 
with the criterion did not vary significantly were selected. Referring to 
this selection technique as predictor stability analysis, this writer recog- 
nizes the following aspects: l) As mentioned earlier, the results obtained 

must be viewed as representing the future and not the present state of the 
art. 2) This process does not elminate the inclusion of suppressor predictors. 
3) Other researchers have commented on the situation under discussion. Per- 
haps Page jnd Paulus best described tae problem when they stated (1968, p.53): 



A 8 is well known, however, we should not expect all 
of this accuracy (high multiple regression coefficients) 
if we took new essays and applied the discovered beta 
weightings to them, to predict their human ratings 
(cross-validation). For any set of scores, or any set 
cf resultant correlations, contains not only true 
variance associated with the variable, but also a 
certain amount of error variance, random for the 
particular subjects concerned, which will not ordinarily 
be found with a new set of human subjects, or essays. 

The true variance givee us information which wLll be 
subsequently useful. But the error variance is also 
capitalised upon by the analysis, and a certain 
portion of the multiple-regression coefficient, and 
of the contributing beta weights, will spuriously 
seem to contribute, but will not stand up in a repli- 
cation. 

4) This procc£J, then, is actually an attempt to conti ol the error vari- 
e referred to in the above quotation. 5) The stable predictors mav bo 
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determined by partitioning only the development sample, but if the regression 
analysis has already been completed, validation in a new experiment is still 
necessary. Thus, if a researcher is concerned with the stability of his pre- 
dictors as related to the criterion, analyses regarding this stability must be 
performed prior to the generation of the full model. 6) The stability of the 
correlation may be statistically established by testing for a significant 
difference between two correlations, as discussed in several texts (e.g., 
Bruning and Kintz, I960). However, it should be noted that the test becomes 
more rigorous by increasing the alpha level- 7) A*u6 importance of empirical 
cross-validation rather than generating a statistical estimate for the be- 
havioral sciences is once again supported. The cross-validation estimates 
calculated by the Wherry and the Lord-Nicholson formulae are not sensitive 
to the spurious effect of the error variance and hence are generally 
optimistic over-estimates. 

The attenuated cross-validation results for the forced models in this 
study were very encouraging. Correlations of -77 and .70 for the flexibility 
and originality equations in Activity 7 were established. The corresponding 
full model results were .56 and .48. Of course, the forced model results 
are tentative, and mu3t be tested in a new sample. They represent a goal of 
minimum stature for future researchers. 
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MEANS AND STANDARD DEVIATIONS FOR JUDGES 
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Orig. .86 1.07 1.72 1.51 .95 1.36 .91 1.36 
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TABLE III 

JUDGE INTER -CORRELATIONS 
TOTAL SAMPLE 



Dimension 


1-2 


r 

1-3 


for ludges x-y 
1-4 2-3 


2-4 


3-4 


Fluency 


.93 


.93 


.88 


.96 


.95 


.91 


Flexibility 


.85 


.85 


.83 


.89 


.87 


.82 


Originality 


.69 


.77 


.61 


.69 


.83 


.55 


fluency 


.91 


.85 


.84 


.86 


.85 


.69 


Flexibility 


.89 


.90 


.80 


.91 


.84 


.78 


Originality 


.67 


.67 


.70 


.62 


.66 


.44 


Fluency 


.96 


.94 


.96 


,97 


.99 


.98 


Originality 




.27 


.37 


.42 


.44 


.16* 


Fluency 


.84 


.76 


.87 


.69 


.79 


.82 


Flexibility 


.58 


.70 


.66 


.59 


.58 


.74 


Originality 


.47 


.50 


.27 


.53 


.51 


.27 



♦Significant at .05 level 

All other correlations are significant at the .01 level 
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TABLE IV 



RELIABILITY ESTIMATES FOR ALL JUDGES 
AND ALL COMBINATIONS OF THREE JUDGES 
USING ANALYSIS OF VARIANCE 
TOTAL SAMPLE 



Activity 


Dimension 


Judge Code 


Rel. 


Adj. Rel. 


4 


Fluency 


0 


.98 


.98 






1 


.97 


.98 






2 


.96 


.97 






3 


.97 


.97 






4 


.98 


.98 


4 


Flexibility 


0 


.96 


.96 






1 


.94 


.95 






2 


.93 


.94 






3 


.94 


.95 






4 


.95 


.95 


4 


Originality 


0 


.76 


.86 






1 


.83 


.87 






2 


.64 


.77 






3 


.65 


.83 






4 


.61 


.81 


5 


Fluency 


0 


.95 


.95 






1 


.92 


.92 






2 


.92 


.92 






3 


.94 


.95 






4 


.95 


.95 




Flexibility 


0 


,96 


.96 




1 


.94 


.94 






2 


.93 


.93 






3 


.94 


.94 






4 


.96 


.96 


5 


Originality 


0 


.64 


.80 






1 


.58 


.75 






2 


.43 


.67 






3 


.66 


.79 


• 




4 


.45 


.74 



0 




lo 



( 15 ) 



TABLE IV CONTINUED 



Activity 


Dimension 


Judge Code 


Rel. 


Adj. Rel. 


6 


Fluency 


0 


.99 


.99 






1 


.99 


.99 






2 


.99 


.99 






3 


.99 


.99 






4 


.98 


.99 


6 


Originality 


0 


.60 


.66 






1 


.56 


.61 






2 


.44 


.48 






3 


.53 


.62 






4 


.53 


.60 


7 


Fluency 


0 


.94 


.94 






1 


.91 


.91 






2 


.93 


.93 






3 


.94 


.94 






4 


.90 


.91 


7 


Flexibility 


0 


CO 

CO 

• 


.80 






1 


.84 


.84 






2 


.89 


.89 






3 


.32 


.32 






4 


.85 


.84 


7 


Originality 


0 


.71 


.74 






1 


.65 


.70 






2 


.60 


.60 






3 


.61 


.68 






4 


.68 


.74 




1G 



<H> 



TABLE V 

FACTOR ANALYSIS OF JUDGES 
ORIGINALITY SCORES 
TOTAL SAMPLE 



Activity 


X 


Eigenvalues 


Cumnlative % 


loading 


r 


r Adi. 


4 


2*37 


3*07 


77 


.37 


.76 


.06 




G.59 


.55 


90 


.52 








5.C3 


.24 


96 


.86 








6. OS 


.15 


100 


.85 






5 


3.04 


2.03 


72 


.90 


.64 


,80 




11.13 


.56 


86 


.87 








3.23 


.33 


95 


.00 








10.72 


.22 


100 


.02 






6 


1.32 


2.C4 


51 


.71 


.60 


.66 




4.40 


.05 


72 


.01 








2.87 


.64 


80 


.62 








1.33 


.47 


100 


.69 






7 


.83 


2.20 


57 


.74 


.71 


.74 




1.72 


.82 


78 


.85 








.95 


.51 


90 


.77 








.91 


.39 


100 


.65 
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